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Preface 


Probabilistic techniques in computer programs and systems are becom- 
ing more and more widely used, for increased efficiency (as in random 
algorithms), for symmetry breaking (distributed systems) or as an unavoid- 
able artefact of applications (modelling fault-tolerance). Because interest 
in them has been growing so strongly, stimulated by their many potential 
uses, there has been a corresponding increase in the study of their correct- 
ness — for the more widespread they become, the more we will depend on 
understanding their behaviour, and their limits, exactly. 

In this volume we address that last concern, of understanding: we present 
a method for rigorous reasoning about probabilistic programs and systems. 
It provides an operational model — “how they work” — and an associated 
program logic — “how we should reason about them” — that are designed 
to fit together. The technique is simple in principle, and we hope that with 
it we will be able to increase dramatically the effectiveness of our analysis 
and use of probabilistic techniques in practice. 


Our contribution is a probabilistic calculus that operates at the level of 
the program text, and it is light-weight in the sense that the amount of 
reasoning is similar in size and style to what standard ies ae 
assertional techniques require. In the fragment at right, while 1 /2 do 


for example, each potential loop entry occurs with prob- z:= 2z; 

ability 1/2; the resulting iteration establishes z > 1/2 ife>1 

with probability exactly p for any 0 < p < 1. It is thus then z:= x-1 
an implementation of the general operation choose with 3 i 


probability p, but it uses only simple tests of unbiased 
random bits (to implement the loop guard). It should take only a little 
quantitative logic to confirm that claim, and indeed we will show that just 
four lines of reasoning suffice. 

Economy and precision of reasoning are what we have come to expect 
for standard programs; there is no reason we should accept less when they 
are probabilistic. 


The cover illustration comes from page 59. 
The program fragment is adapted from Fig. 7.7.10 on page 210. 
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Scope and applicability 


Methods for the analysis of probabilistic systems include automata, labelled 
transition systems, model checking and logic (e.g. dynamic or temporal). 
Our work falls into the last category: we overlay the Hoare-logic paradigm 
with probabilistic features imported from Markov processes, taking from 
each the essential characteristics required for a sound mathematical theory 
of refinement and proof. The aim is to accommodate modelling and analysis 
of both sequential and distributed probabilistic systems, and to allow — 
even encourage — movement between different levels of abstraction. 

Our decision to focus on logic — and a proof system for it — was moti- 
vated by our experience with logical techniques more generally: they impose 
a discipline and order which promotes clarity in specifications and design; 
the resulting proofs can often be carried out, and checked, with astonishing 
conciseness and accuracy; and the calculation rules of the logic lead to an 
algebra that captures useful equalities and inequalities at the level of the 
programs themselves. 

Although we rely ultimately on an operational model, we use it prin- 
cipally to validate the logic (and that, in turn, justifies the algebra) — 
direct reliance on the model’s details for individual programs is avoided if 
possible. (However we do not hesitate to use such details to support our 
intuition.) We feel that operational reasoning is more suited to the algorith- 
mic methods of verification used by model checkers and simulation tools 
which can, for specific programs, answer questions that are impractical for 
the general approach that a logic provides. 

Thus the impact of our approach is most compelling when applied to pro- 
grams which are intricate either in their implementation or their design, or 
have generic features such as undetermined size or other parameters. They 
might appear as probabilistic source-level portions of large sequential pro- 
grams, or as abstractions from the probabilistic modules of a comprehensive 
system-level design; we provide specific examples of both situations. In the 
latter case the ability to abstract modules’ properties has a significant effect 
on the overall verification enterprise. 


Technical features 


Because we generalise the well-established assertional techniques of specifi- 
cations, pre- and postconditions, there is a natural continuity of reasoning 
style evident in the simultaneous use of the new and the familiar ap- 
proaches: the probabilistic analysis can be deployed more, or less, as the 
situation warrants. 

A major feature is that we place probabilistic choice and abstraction 
together, in the same framework, without having to factor either of them 
out for separate treatment unless we wish to (as in fact we do in Chap. 11). 
This justifies the abstraction and refinement of our title, and is what gives 
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us access to the stepwise-development paradigm of standard programming 
where systems are “refined” from high levels of abstraction towards the low 
levels that include implementation detail. 

As a side-effect of including abstraction, we retain its operational 
counterpart demonic choice as an explicit operator M in the cut-down 
probabilistic programming language pGCL which we use to describe our 
algorithms — that is, the new probabilistic choice operator p® refines de- 
monic choice rather than replacing it. In Chap. 8 we consider angelic choice 
LI as well, which is thus a further refinement. 

Probabilistic and demonic choice together allow an elementary treatment 
of the hybrid that selects “with probability at least p” (or similarly “at most 
p”), an abstraction which accurately models our unavoidable ignorance of 
exact probabilities in real applications. Thus in our mathematical model 
we are able to side-step the issue of “approximate refinement.” 

That is, rather than saying “this coin refines a fair coin with probability 
95%,” we would say “this coin refines one which is within 5% of being 
fair.” This continues the simple view that either an implementation refines 
a specification or it does not, which simplicity is possible because we have 
retained the original treatment in terms of sets of behaviours: abstraction 
is inclusion; refinement is reverse inclusion; and demonic choice is union. 
In that way we maintain the important relationship between the three 
concepts. (Section 6.5 on pp. 169ff illustrates this geometrically.) 


Organisation and intended readership 


The material is divided into three major parts of increasing specialisation, 
each of which can to a large extent be studied on its own; a fourth part 
contains appendices. We include a comprehensive index and extensive cross- 
referencing. 

Definitions of notation and explanations of standard mathematical tech- 
niques are carefully given, rather than simply assumed; they appear as 
footnotes at their first point of use and are made visually conspicuous by 
using SMALL CAPITALS for the defined terms (where grammar allows). Thus 
in many cases a glance should be sufficient to determine whether any foot- 
note contains a definition. In any case all definitions, whether or not in 
footnotes, may be retrieved by name through the index; and those with 
numbers are listed in order at page xvii. 

Because much of the background material is separated from the main 
text, the need for more advanced readers to break out of the narrative 
should be reduced. We suggest that on first reading it is better to consult 
the footnotes only when there is a term that appears to require definition 
— otherwise the many cross-references they contain may prove distracting, 
as they are designed for “non-linear” browsing once the main ideas have 
already been assimilated. 
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Part I, Probabilistic guarded commands, gives enough introduction to the 
probabilistic logic to prove properties of small programs such as the one 
earlier, for example at the level of an undergraduate course for Formal- 
Methods-inclined students that explains “what to do” but not necessarily 
“why it is correct to do that.” These would be people who need to un- 
derstand how to reason about programs (and why), but would see the 
techniques as intellectual tools rather than as objects of study in their own 
right. 

We have included many small examples to serve as models for the ap- 
proach (they are indexed under Programs), and there are several larger 
case studies (for example in Chap. 3). 


Part II, Semantic structures, develops in detail the mathematics on which 
the probabilistic logic is built and with which is it justified. That is, whereas 
the earlier sections present and illustrate the new reasoning techniques, this 
part shows where they have come from, why they have the form they do 
and — crucially — why they are correct. 

That last point is especially important for students intending to do re- 
search in logic and semantics, as it provides a detailed and extended worked 
example of the fundamental issue of proving reasoning techniques them- 
selves to be correct (more accurately, “valid” ), a higher-order concept than 
the more familiar theme of the previous part in which we presented the 
techniques ex cathedra and used them to verify particular programs. 

This part would thus be suitable for an advanced final-year under- 
graduate or first-year graduate course, and would fit in well with other 
material on programming semantics. It defines and illustrates the use of 
many of the standard tools of the subject: lattices, approximation orders, 
fixed points, semantic injections and retractions etc. 


Part III, Advanced topics, concentrates on more exotic methods of specifi- 
cation and design, in this case probabilistic temporal/modal logics. Its final 
chapter, for example, contains material only recently discovered and leads 
directly into an up-to-date research area. It would be suitable for graduate 
students as an introduction to this specialised research community. 


Part IV includes appendices collecting material that either leads away 
from the main exposition — e.g. alternative approaches and why we have 
not taken them — or supports the text at a deeper level, such as some of 
the more detailed proofs. 

It also contains a short list of algebraic laws that demonic/probabilistic 
program fragments satisfy, generated mainly by our needs in the examples 
and proofs of earlier sections. An interesting research topic would be a 
more systematic elaboration of that list with a view to incorporating it 
into probabilistic Kleene- or omega algebras for distributed computations. 
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Overall, readers seeking an introduction to probabilistic formal methods 
could follow the material in order from the beginning. Those with more 
experience might instead sample the first chapter from each part, which 
would give an indication of the scope and flavour of the approach generally. 


Original sources 


Much of the material is based on published research, done with our col- 
leagues, in conference proceedings and journal articles; but here it has been 
substantially updated and rationalised — and we have done our best to 
bring the almost ten years’ worth of developing notation into a uniform 
state. 

For self-contained presentations of the separate topics, and extra 
background, readers could consult our earlier publications as shown 
overleaf. 

At the end of each chapter we survey the way in which our ideas have 
been influenced by — and in some cases adopted from — the work of other 
researchers, and we indicate some up-to-date developments. 
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4 1. Introduction to pGCL 
1.1 Sequential program logic 


Since the mid-1970’s, any serious student of rigorous program development 
will have encountered “assertions about programs” — they are predicates 
which, when inserted into program code, are supposed to be “true at that 
point of the program.” Formalised — i.e. made into a logic — they look 
like either 


{pre} prog {post} Hoare-style \ (1.1) 

or pre > wp.prog.post , Dijkstra-style i 
in each case meaning “from any state satisfying precondition pre, the se- 
quential program prog is guaranteed to terminate in a state satisfying 
postcondition post.” ! Formulae pre and post are written in first-order 
predicate logic over the program variables, and prog is written in a sequen- 
tial programming language. Often Dijkstra’s Guarded Command Language 
[Dij76], called GCL, is used in simple expositions like this one, since it 
contains just the essential features, and no clutter. 

A conspicuous feature of Dijkstra’s original presentation of guarded com- 
mands was the novel “demonic” choice. He explained that it arose naturally 
if one developed programs hand-in-hand with their proofs of correctness: 
if a single specification admitted say two implementations, then a third 
possibility was program code that seemed to choose unpredictably between 
the two. Yet in its pure form, where for example 


prog M prog’ (1.2) 
is a program that can unpredictably behave either as prog or as prog’, 
this “demonic” nondeterminism seemed at first — to some — to be an 
unnecessary and in fact gratuitously confusing complication. Why would 
anyone ever want to introduce unpredictability deliberately? Programs are 
unpredictable enough already. 

If one really wanted programs to behave in some kind of “random” way, 
then more useful surely would be a construction like the 


prog 18 prog’ (1.3) 


that behaves as prog on half of its runs, and as prog’ on the other half. Of 
course on any particular run the behaviour is unpredictable, and even over 
many runs the proportions will not necessarily be exactly “50/50” — but 
over a long enough period one will find approximately equal evidence of 
each behaviour. 

A logic and a model for programs like (1.3) was in fact provided in the 
early 1980’s [Koz81, Koz85], where in the “Kozen style” the pre- and post- 
formulae became real- rather than Boolean functions of the state, and M 
was replaced by „® in the programming language. Those logical statements 


1We will use the Dijkstra-style. 
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(1.1) now took on a more general meaning, that “if program prog is run 
many times from the same initial state, the average value of post in the 
resulting final states is at least the actual value that pre had in the initial 
state.” Naturally we are relying on the expressions’ pre and post having 
real- rather than Boolean type when we speak of their average, or expected 
value. 

The original — standard, we call it — Boolean logic was still available 
of course via the embedding false, true — 0,1. 


Dijkstra’s demonic M was not so easily discarded, however. Far from being 
“an unnecessary and confusing complication,” it is the very basis of what 
is now known as refinement and abstraction of programs. (The terms are 
complementary: an implementation refines its specification; a specification 
abstracts from its implementation.) To specify “set r to a square-root of 
s” one could write directly in the programming language GCL 


rm=—-J/s N r:= vs, 2 (1.4) 


something that had never been possible before. This explicit, if acciden- 
tal, “programming feature” caught the tide that had begun to flow in that 
decade and the following: the idea that specifications and code were merely 
different ways of describing the same thing (as advocated by Abrial, Hoare 
and others; making an early appearance in Back’s work [Bac78] on what be- 
came the Refinement Calculus [Mor88b, Bac88, Mor87, Mor94b, BvW98]; 
and as found at the heart of specification and development methods such 
as Z [Spi88] and VDM [Jon86]). 

Unfortunately, probabilistic formalisms were left behind, and did not em- 
brace the new idea: replacing N by „®, they lost demonic choice; without 
demonic choice, they lost abstraction and refinement; and without those, 
they had no nontrivial path from specification to implementation, and no 
development calculus or method. 


2 Admittedly this is a rather clumsy notation when compared with those designed 
especially for specification, e.g. 


r:[r2 = s] a specification statement (Back, Morgan, Morris) 
(r)? =s (the body of) a Z schema (Abrial, Oxford) 
T sg VDM (Bjørner, Jones) 


any 7! with (r’)? = s then r:= r’ end a generalised substitution (Abrial) 


But the point is that the specification could be written in a “programming language” at 
all: it was beginning to be realised that there was no reason to distinguish the meanings 
of specifications and of programs (a point finally crystallised in the subtitle Assigning 
Programs to Meanings of Abrial’s book [Abr96a], itself a reference 30 years further back 
to Floyd's paper [Flo67] where it all began). 
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To have a probabilistic development method, we need both M and p® 
— we cannot abandon one for the other. Using them together, we can for 
example describe “flip a nearly fair coin” as 


c:= headsg49Ptails Mm c:= heads o.51@® tails . 


What we are doing here is specifying a coin which is within 1% of being 
fair — just as well, since perfect 9.5® coins do not exist in nature, and so 
we could never implement a specification that required one.? This program 
abstracts, slightly, from the precise probability of heads or tails. 

In this introduction we will see how the seminal ideas of Floyd, Hoare, 
Dijkstra, Abrial and others can be brought together and replayed in 
the probabilistic context suggested by Kozen, and how the milestones of 
sequential program development and refinement — the concepts of 


e program assertions; 

e loop invariants; 

e loop variants; 

e program algebra (e.g. monotonicity and conjunctivity) 


— can be generalised to include probability. Our simple programming lan- 
guage will be Dijkstra's, but with pp added and — crucially — demonic 
choice M retained: we call it pGCL. 


Section 1.2 gives a brief overview of pGCL and its use of so-called expec- 
tations rather than predicates in its accompanying logic; Section 1.3 then 
supplies operational intuition by relating pGCL operationally to a form of 
gambling game. (The rigorous operational semantics is given in Chap. 5, 
and a deeper connection with games is given in Chap. 11.) Section 1.4 
completes the background by reviewing elementary probability theory. 

Section 1.5 gives the precise syntax and expectation-transformer seman- 
tics of pGCL, using the infamous “Monty Hall” game as an example. 
Finally, in Sec. 1.6 we make our first acquaintance with the algebraic 
properties of pGCL programs. 


Throughout we write f.z instead of f(x) for function application of f to 
argument z, with left association so that f.g.z is (f (9) (x); and we use “: =” 
for is defined to be. For syntactic substitution we write expr (var — term) 


3That means that probabilistic formalisms without abstraction in their specifications 
must introduce probability into their refinement operator if they are to be of any practical 
use: writing for example prog Eo.99 prog’ can be given a sensible meaning even if the 
probability in prog is exact [DGJP02, vVBMOW03, Yin03]. But we do not follow that 
path here. 


1.2. The programming language pGCL 7 


to indicate replacing var by term in expr. We use “overbar” to indicate 
complement both for Booleans and probabilities: thus true is false, and p is 
1—p. 


1.2 The programming language pGCL 


We’ll use square brackets [-] to convert Boolean-valued predicates to arith- 
metic formulae which, for reasons explained below, we call expectations. 
Stipulating that [false] is zero and [true] is one makes [P] in a trivial 
sense the probability that a given predicate P holds: if false, it holds with 
probability zero; if true, it holds with probability one.* 

For our first example, consider the simple program 


D= -y 18 z= +y (1.5) 


over integer variables z, y: Z, using the new construct 10 which we interpret 
as “choose the left branch z:— —y with probability 1/3, and choose the 
right branch with probability 1 — 1/3.” 

Recall [Dij76] that for any predicate post over final states, and a standard 
command prog,” the “weakest precondition” predicate wp.prog.post acts 
over initial states: it holds just in those initial states from which prog is 
guaranteed to reach post. Now suppose prog is probabilistic, as Program 
(1.5) is; what can we say about the probability that wp.prog.post holds in 
some initial state? 

It turns out that the answer is just wp.prog.[post], once we generalise 
wp.prog to expectations instead of predicates. For that, we begin with the 
two definitions © 


wp.(a: = E).postE = “postE with x replaced (1.6) 
everywhere by E” 7 

p * wp.prog.postE (1.7) 
+ Dx wp.prog'.postE , 


wp.(prog »® prog’ ).postE 


in which post is an expectation, and for our example program we ask what 
is the probability that the predicate “the final state will satisfy x > 0” holds 
in some given initial state of the program (1.5)? 

To find out, we calculate wp.prog.[post] using the definitions above; that 
is 


4Note that this nicely complements our “overbar” convention, because for any 
predicate P the two expressions [P] and [P] are therefore the same. 

S Throughout we use STANDARD to mean “non-probabilistic.” 

6 Here we are defining the language as we go along; but all the definitions are collected 
together in Fig. 1.5.3 (p. 26). 

7In the usual way, we take account of free and bound variables, and if necessary 
rename to avoid variable capture. 
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wp.(“:= —y 10 a= +y).[x > 0] 


=8 (1/3) * wp.(a: = —y).[a > 0] using (1.7) 
+ (2/3) * wp.(a:= +y).[x > 0] 

= (1/3) [-y = 0] + (2/3) Hy 2 0] using (1.6) 

= ly < 0]/3 + W=0l + 2ly>0]/3. using arithmetic 


Thus our answer is the last arithmetic formula above, which we call a “pre- 
expectation” — and the probability we seek is found by reading off the 
formula’s value for various initial values of y, getting 


when y < 0, 1/3+0+2(0)/3 = 1/3 
when y = 0, 0/3 +14 2(0)/3 = 1 
when y > 0, 0/3 +04 2(1)/3 = 2/3 


Those results indeed correspond with our operational intuition about the 
effect of 10. 


For our second example we illustrate abstraction from probabilities: a 
demonic version of Program (1.5) is much more realistic in that we set 
its probabilistic parameters only within some tolerance. We say informally 
(but still precisely) that 


e 1:— —y is to be executed with 
probability at least 1/3, 


e z:= +y is to be executed with (1.8) 
probability at least 1/4 and 


e it is certain that one or the other 
will be executed. 


Equivalently we could say that alternative x: = —y is executed with prob- 
ability between 1/3 and 3/4, and that otherwise 1:— +y is executed 
(therefore with probability between 1/4 and 2/3). 

With demonic choice we can write Specification (1.8) as 


TE -y 19 t:= +y N w= —-ysOn= Hy, (1.9) 


because we do not know or care whether the left or right alternative of M 
is taken — and it may even vary from run to run of the program, resulting 
in an “effective” ,@ with p somewhere between the two extremes.’ 


(EE) 


SLater we explain the use of “=” rather than 
We will see later that a convenient notation for (1.9) uses the abbreviation 


prog pBq prog’ := progp® prog’ N prog’ g® prog ; 
e 


we would then write it z:= —y 1@1 z:= +y,oreven r:= —yi®icty. 
3 4 3 4 
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To treat Program (1.9) we need a third definition, 
wp.(prog prog’).postE :=  wp.prog.postE min wp.prog’.postE , (1.10) 


using min because we regard demonic behaviour as attempting to make the 
achieving of post as improbable as it can. Repeating our earlier calculation 
(but more briefly) gives this time 


wp.( Program (1.9) ).[z > 0] 


= [ly < 0] /3+2[y > 01/3 using (1.6), (1.7), (1.10) 
min  3[y < 0] /4+ [y 2 0] /4 
= ly < 0]/3 + W=0l + W> 0/4. using arithmetic 


Our interpretation has become 


e When y is initially negative, a demon chooses the left branch of M 
because that branch is more likely (2/3 vs. 1/4) to execute x: = Hy 
— the best we can say then is that x > 0 will hold with probability 
at least 1/3. 


e When y is initially zero, a demon cannot avoid z > 0 — either way 
the probability of z > 0 finally is one. 


e When y is initially positive, a demon chooses the right branch because 
that branch is more likely to execute x: = —y — the best we can say 
then is that z > 0 finally with probability at least 1/4. 


The same interpretation holds if we regard M as abstraction instead of 
as run-time demonic choice. Suppose Program (1.9) represents some mass- 
produced physical device and, by examining the production method, we 
have determined the tolerance (1.8) we can expect from a particular factory. 
If we were to buy one from the warehouse, all we could conclude about its 
probability of establishing z > 0 is just as calculated above. 


Refinement is the converse of abstraction: we have 


Definition 1.2.1 PROBABILISTIC REFINEMENT For two programs prog, 
prog’ we say that prog’ is a refinement of prog, written prog E prog’, 
whenever for all post-expectations postk we have 


wp.prog.postE =>  wp.prog’.postE (1.11) 


We use the symbol > for < (extended pointwise) between expecta- 
tions, which emphasises the similarity between probabilistic- and standard 
refinement.!° 


10We are aware that “2” looks more like “>” than it does “<”; but for us its 
resemblance to “=>” is the important thing. 
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From (1.11) we see that in the special case when expectation postE is 
an embedded predicate [post], the meaning of > ensures that a refinement 
prog! of prog is at least as likely to establish post as prog is.!' That accords 
with the usual definition of refinement for standard programs — for then 
we know wp.prog.[post] is either zero or one, and whenever prog is certain 
to establish post (whenever wp.prog.[post] = 1) we know that prog’ also is 
certain to do so (because then 1 > wp.prog’.[post]). 


For our third example we prove a refinement: consider the program 


TE -y 8 w= +y, (1.12) 


which clearly satisfies Specification (1.8); thus it should refine Program 
(1.9), which is just that specification written in pGCL. With Definition 
(1.11), we find for any postE that 


wp.( Program (1.12) ).postE 


wp.(a: = —y).postE/2 definition ,@, at (1.7) 
+ wp.(a:= +y).postE/2 


postE~/2 +  postET /2 introduce abbreviations 


(3/5) (postE— /3 + 2postE + /3) arithmetic 
+ (2/5)(3postE~ /4 + postE T /4) 


E postE~ /3 + 2postE*/3 any linear combination exceeds min 
min 3postE~ /4+ postE /4 


= wp.( Program (1.9) ).postE . 


The refinement relation (1.11) is indeed established for the two programs. 

The introduction of 3/5 and 2/5 in the third step can be understood 
by noting that demonic choice M can be implemented by any probabilistic 
choice whatever: in this case we used 30. Thus a proof of refinement using 
program algebra might read 


Program (1.12) 
= TE -y 10 B= +y 


--10Similar conflicts of interest arise when logicians use “D” for implies although, in- 
terpreted set-theoretically, implies is in fact “C”. And then there is “LC” for refinement, 
which corresponds to “D” of behaviours. 

11 We see later in this chapter, however, and in Sec. A.1, that it is not sound to consider 
only post-expectations postE of the form [post] in Def. 1.2.1: it is necessary for refine- 
ment, but not sufficient, that prog’ be at least as likely to establish any postcondition 
post as prog is. 
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10 Tv +y) Sec. B.1 Law 4 
© Fey 49 a= Fy) 


e x= +y (N) E (p®) for any p ” 


IL] 
l 
e 


= Program (1.9) . 


1.3 An informal computational model: 
pGCL describes gambling 


We now use a simple card-and-dice game as an informal introduction to the 
computational model for pGCL, to support the intuition for probabilistic 
choice, demonic choice and their interaction. To start with, we consider the 
simplest case: non-looping programs without M or p®. 


1.3.1 The standard game 


Imagine we have a board of numbered squares, and a selection of numbered 
cards laid on it with at most one card per square; winning squares are 
indicated by coloured markers. The squares are the program states; the 
program is the pattern of numbered cards; the coloured markers indicate 
the postcondition. 

To play the game 


An initial square is chosen (according to certain rules which do 
not concern us); subseguently 
e if the square contains a card the card is removed, and play 
continues from the square whose number appeared on the 
card, and 
e if the square does not contain a card, the game is over. 
When the game is over the player has won if his final square 
contains a marker — otherwise he has lost. 


This simple game is deterministic: any initial state always leads to the 
same final state. And because the cards are removed after use it is also guar- 
anteed to terminate, if the board is finite. It is easily generalised however 
to include other features of standard programs: 


12By (N) E (p®) we mean that for all prog, prog’ we have 


prog N prog’ E prog p® prog’, 


which is an instance of our Law 7 given on p. 323, in Sec. B.1 on program algebra. 
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looping If the cards are not removed after use, the game can “loop.” A 
looping-forever player loses. 


aborting If a card reads go to jail, the program is said to “abort” and 
the player can be sent to any square whatever, including a special 
supplementary “jail” square from which there is no escape. A jailed 
player loses. 


demonic nondeterminism If each square can contain several cards, face- 
down, and the rules are modified so that the next state is determined 
by choosing just one of them “blind,” then play is nondeterministic. 
Taking the demonic (pessimistic) view, the player should expect to 
lose unless he is guaranteed to reach a winning position no matter 
which blind choices he makes. 


In the standard game, for each (initial) square one can examine the cards 
before playing to determine whether a win is guaranteed from there. But 
once the game has started, the cards are turned face-down. 


The set of squares from which a win is guaranteed is the weakest 
precondition.13 


1.3.2 The probabilistic game 


Suppose now that each card contains not just one but, rather, a list of 
successor squares, and the choice from the list is made by rolling a die. In 
this deterministic game, 4 play becomes a succession of die rolls, taking the 
player from square to square; termination (no card) and winning (marker) 
are defined as before. 

When squares can contain several cards face down, each with a separate 
list of successors to be resolved by die roll, we are dealing with probability 
and demonic nondeterminism together: first the card is chosen “blind” (i.e. 
demonically); the card is turned over and a die roll (probability) determines 
which of its listed alternatives to take. 

In the probabilistic game one can ask for the greatest guaranteed prob- 
ability of winning; as in the standard case, the prediction will vary 
depending on the initial square. (It’s because of demonic nondeterminism, 
as illustrated below, that the probability might be only a lower bound.) 


13 glance at Fig. 6.7.1 (p. 173) will show where we are headed in the visualisation of 
probabilistic preconditions! 

14Note that we still call this game “deterministic,” in spite of the probabilistic choices, 
and there are good mathematical reasons for doing so. (In Chap. 5, for example, we see 
that such programs are maximal in the refinement order.) An informal justification is 
that deterministic programs are those with repeatable behaviours and, even for proba- 
bilistic programs, the output distribution is repeatable (to within statistical confidence 
measures) provided the program contains no demonic choice; see e.g. p. 135. 
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In Fig. 1.3.1 is an example game illustrating some of the above points. 
The greatest guaranteed probability of winning from initial state 0 is only 
1/2, in spite of the fact that the player can win every time if he is lucky 
enough to choose the first card in the pile; but he might be unlucky enough 
never to choose the first card, and we must assume the worst. 


1.3.3 Expected winnings in the probabilistic game 


For standard programs, the computational model of execution supports a 
complementary, “logical” view — given a set of final states (the postcon- 
dition) we can examine the program to determine the largest set of initial 
states (the weakest precondition) from which execution of the program 
is guaranteed to reach the designated final states. The sets of states are 
predicates, and the program is being regarded as a predicate transformer. 

Regarding sets of states as characteristic functions (from the state space 
into {0,1}), we generalise to “probabilistic predicates” by extending the 
range of those functions to all of R>, the non-negative reals.'° 

Probabilistic programs become functions from probabilistic postcon- 
ditions to probabilistic weakest preconditions — we call them post- 
expectations and greatest pre-expectations. The corresponding generalisa- 
tion in the game is as follows. 

Rather than placing winning markers on the board, we place money — 
rather than strictly winning or losing, the player simply keeps whatever 
money he finds in his final square. In Fig. 1.3.2 we show the effect of 
translating our original game. In fact, not much changes: the probability 
of winning (in Fig. 1.3.1) translates into the equivalent expected payoff 
(Fig. 1.3.2) as the corresponding fraction of £1, illustrating this important 
fact: 


The expected value of a characteristic function over a distri- 
bution is the same as the probability assigned to the set that 
function describes. 


Thus using expectations is at least as general as using probabilities explic- 
itly, since we can always restrict ourselves to 10, 1}-valued functions from 
which probabilities are then recovered. 

For probabilistic programs, the operational interpretation of execution 
thus supports a “logical” view also — given a function from final states 
to R> (the post-expectation) one can examine the program beforehand 
to determine for each initial state the minimum expected (or “aver- 
age”) win when the game is played repeatedly from there (the greatest 
pre-expectation) — also therefore a function from states to Rs. 


15In later chapters we will be more precise about the range of expectations, requiring 
them in particular to be bounded above. 
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To play from a square, you first pick one of the face-down cards. (In the diagram, 
we are seeing what's on the cards with our z-ray vision.) Then you roll a die to 
choose one of the alternatives on the card. (In this case the die is two-sided, i.e. 
it is a coin.) 

As special cases, a standard step (non-probabilistic) has only one alternative per 
card, but possibly many cards; and a deterministic step has only one card, but 
possibly many alternatives on it. A standard and deterministic step has one card, 
and only one alternative. 


The winning final positions — the postcondition — are the states {4, 5}, marked 
with a £1 coin. From initial state 2 a win is guaranteed; from state 0 or 1 the 
minimum guaranteed probability of winning is 1/2; from state 3 the minimum 
probability is zero, since the second card might be chosen every time. 


The probabilities are summarised in Fig. 1.3.2. 


Figure 1.3.1. CARD-AND-DICE GAME OPERATIONAL SEMANTICS FOR pGCL 


1.3. An informal computational model for pGCL 15 


The post-expectation: 
Final state [0 J123] 4 ES 
Payoff awarded if this state reached | 0 | 0 ] 0] 0] £1] £1] 0 | 


The probability of winning (ending on a £1) (from Fig. 1.3.1): 
1 


Initial state OT [2 
ENKA ENEN KG 


Greatest guaranteed probability of winning 


The greatest pre-expectation: 


Initial state OTT TZ 
Greatest guaranteed expected payoff | 50p | 50p | £1 [0 | £1] £1 | 0 | 


Figure 1.3.2. A PROBABILISTIC AND NONDETERMINISTIC GAMBLING GAME 


Since the functions are expectations, the program is being regarded as an 
expectation transformer.!® 


We are not limited to £1 coins for indicating postconditions — that is 
only an artefact of embedding standard postconditions into the probabilis- 
tic world. In general any amount of money can be placed in a square, and 
that is the key to allowing a smooth sequential composition of programs 
at the logical level — for if the program game of Fig. 1.3.2 were executed 
after some other program prog, the precondition of the two together with 
respect to the postcondition {4,5} would be calculated by applying wp.prog 
to the greatest pre-expectation table for game. That is because sequential 
composition of programs becomes, as usual, functional composition of the 
corresponding transformers: we have 


expected win table 
pa. GEEN 
wp.(prog; game) {4,5} :—  wp.prog.( wp.game.{4,5}) , 


and that table contains non-integer values (for example 50p). 

Another reason for allowing arbitrary values in R> is that using only 
standard postconditions (10, 1}-valued) — equivalently, using explicit prob- 
abilities (recall the important fact above) — is not discriminating enough 
when nondeterminism is present: certain programs are identified that 
should be distinguished, and the semantics becomes non-compositional. 
(See Sec. A.1 for why this happens.) 


16 For deterministic (yet probabilistic) programs, the card-game model and the associ- 
ated transformers are essentially Kozen’s original construction [Koz81, Koz85]. We have 
added demonic (and later angelic) nondeterminism. 
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1.4 Behind the scenes: elementary probability 
theory 


In probability theory, an event is a subset of some given sample space S, so 
that the event is said to have occurred if the sampled value is in that set; 
a probability distribution Pr over the sample space is a function from its 
events into the closed interval [O, 1], giving for each event the probability 
of its occurrence. In the general case, for technical reasons, not necessarily 
all subsets of the sample space are events.!” 

In our case we consider countable sample spaces, and take every (sub-)set 
of S to be an event — and so we can regard a probability distribution more 
simply as a function from S directly to probabilities (rather than from its 
subsets). Thus Pr: S — [0,1], and the probability of a more general event is 
now just the sum of the probabilities of its elements: we are using discrete 
distributions.1§ 

A random variable X is a function from the sample space to the non- 
negative reals;!® and the expected value Exp.X of that random variable is 
defined in terms of the (discrete) probability distribution Pr; we have the 
summation 


Exp.X := De Pr.sx X.s). 2 (1.13) 


It represents the “average” value of X.s over many repeated samplings 
of s according to the distribution Pr.?! 

In fact expected values can also be characterised without referring 
directly to an underlying probability distribution: 


If a function Exp is of type (S — R>) — Rs, and it is 


non-negative so that Exp.X > 0 for all X: S — Rs, 
linear so that for X,Y: S — R> and c,d: R> we have 


Exp.(cx X +d*Y) = cxExp.X + dx Exp.Y 


17 This may occur if the sample space is uncountable, for example; the general 
technique for such cases involves a-algebras [GS92]. See Footnote 7 on p.297 for an 
example. 

18The price paid for using discrete distributions is that there are some “everyday” 
situations we cannot describe, such as the uniform “continuous” distribution over the 
real interval [0,1] that might be the result of the program “choose a real number x 
randomly so that 0 < x < 1.” We get away with it because no such program can be 
written in pGCL — at least, not at this stage. 

19 Footnote 12 on p. 134 gives a more generous definition. 

20 Although the parentheses may look odd around > — we write ()---) rather 
than ` (---) — we always indicate the scope of bound variables (like s) with explicit 
delimiters, since it helps to avoid errors when doing calculations. 

21Our “important fact” (p.13) is now stated “if X is the characteristic function of 
some event P, then Exp.X is the probability that event P will occur.” 
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and normalised so that it satisfies Exp.l = 1, where 1 is the 
constant function returning 1 for all arguments in S, 


then it is an expectation over some probability distribution: it 
can be shown that it is expressible uniquely in the form (1.13) 
for some Pr.?? 


The relevance of the above is that our real-valued expressions over the 
state — what we are calling “expectations” — are random variables, and 
that the expression 


wp.prog.postE , (1.14) 


as a function of initial values for the state variables, is a random variable 
as well. As a function of state variables, it is the expected value of the 
random variable postE (also a function of state variables, but those taken 
after execution) over the distribution of final states produced by executions 
of prog, and so 


pre => wp.prog.postE (1.15) 


says that pre gives in any initial state a lower bound for the expected 
value of postk in the final distribution reached via execution of prog begun 
in that initial state. 


In general, we call random variables post-ezpectations when they are to 
be evaluated in a final state, and we call them pre-ezpectations when they 
are calculated as at (1.14). And, like pre- and postconditions in standard 
programs, if placed “between” two programs a single random variable is a 
post-expectation for the first and a pre-expectation for the second. 

But how do prog and an initial state determine a distribution? In fact the 
underlying distributions are found on the cards of the game from Sec. 1.3 
— the sample space is the set of squares, and each card gives an explicit 
distribution over that space. If we consider the deterministic game, and 
regard “make one move in the game” as a program in its own right, then 
we have a function from initial state to final distribution — the function 
taking a square to the card that square contains.?? For any postcondition 
postk written, say, as an expression over names N of squares, and initial 
square No, the expression wp.move.postE (N ++ No) is the expectation of 
postE over the distribution of square names given on the card found at No. 


2214 is a special case of the Riesz Representation Theorem which states, loosely speak- 
ing, that knowledge of the expectation (assumed to be given directly) of every random 
variable uniquely determines an underlying probability distribution. See for instance 
Feller [Fel71, p. 135]. 

23For nondeterministic programs we are thus considering a function from state to 
sets of distributions, from a square to the set of cards there; again we see the general 
computational model underlying the expectation-transformer semantics. 
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For example, in Figs. 1.3.1 and 1.3.2 we see the above features: program 
move is given by the layout of the cards (Fig. 1.3.1); and the resulting pre- 
and post-expectations are tabulated in Fig. 1.3.2. All three tables there are 
random variables over the state space {0,---,6}. 

When we move to more general programs, we must relax the conditions 
that characterise expectations. If prog is possibly nonterminating — if it is 
recursive or contains abort — then wp.prog.postE may violate the normal- 
isation condition Exp.1 = 1. However as a function which satisfies the first 
two conditions it can still be regarded as an expectation in a weak sense. 
That was shown by Kozen [Koz81] and later Jones [Jon90], who defined 
expectations with regard to “probability distributions” which may sum to 
less than one. Those are in fact a special case of Jones’s evaluations,2* and 
she gave conditions similar to the above for their existence [Jon90, p. 117]. 

Finally, if program prog is not deterministic then we move further away 
from elementary theory, because wp.prog.postE is no longer an expectation 
even in the weak sense: it not linear. It is still however the minimum of 
a set of expectations: if prog and prog’ are deterministic programs then 
wp.(prog N prog’).postE is the pointwise minimum of the two expecta- 
tions wp.prog.postE and wp.prog'.postE. This definition is one of the main 
features of this approach. 

Thus although linearity is lost, it is not gone altogether: we retain so- 
called sub-linearity,”? which implies that for any c1, co: R> and any program 
prog we still have 


wp.prog.(c1 * postE, + co * postE5) 
= cı * wp.prog.postE, + ca * wp.prog.postEs . 


And clearly non-negativity continues to hold. 

The characterisations of expectations given above for the simpler cases 
might suggest that non-negative and sublinear functionals uniquely deter- 
mine a set of probability distributions — and, in Chap. 5, that is indeed 
shown to be the case: sublinearity is the key “healthiness condition” for 
expectation transformers.26 


1.5 Basic syntax and semantics of pGCL 


1.5.1 Syntax 


Let prog range over programs and p over real number expressions taking 
values between zero and one inclusive; assume that x stands for a list of 
distinct variables, and expr for a list of expressions (of the same length as x 


24She was working in a much more general context. 
25The actual property is slightly more general than we give here; see Sec. 1.6. 
26Halpern and Pucella [HP02] have recently studied similar properties. 
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where appropriate); and let the program scheme C be a program in which 
program names like zxr can appear. The syntax of pGCL is as follows: 


prog := abort | skip | z:= E | prog; prog | 
prog p® prog | prog prog | (1.16) 
(mu zzz. C) 
The first four constructs, namely abort, skip, assignment and sequential 
composition, are just the conventional ones [Dij76]. 

The remaining constructs are for probabilistic choice, nondeterministic 
choice and recursion: given p in the closed interval [0,1] we write prog p® 
prog’ for the probabilistic choice between programs prog and prog”; they 
have probability p and 1—p respectively of being selected. In many cases 
p will be a constant, but in general it can be an expression over the state 
variables. 


1.5.2 Shortcuts and “syntactic sugar” 


For convenience we extend our logic and language with the following 
notations. 


Boolean embedding — For predicate pred we write [pred] for the 
expectation “1 if pred else 0” 27 
Conditional — The conditional 
prog if pred else prog’ 
or if pred then prog else prog’ fi , 
chooses program prog (resp. prog’) if Boolean pred is true (resp. false). 


It is defined prog [pred]? prog! . 


If else is omitted then else skip is assumed. (See also the “hybrid” 
conditional of Sec. 3.1.2.) 


Implication-like relations — For expectations exp, exp’ we write 


exp > exp’ for exp is everywhere less than or equal to ezp’ 
exp = exp’ for exp and exp! are everywhere equal 


exp < exp’ for exp is everywhere greater than or equal to exp’ 


We distinguish exp > exp’ from exp < exp’ — the former is a state- 
ment about exp and exp’, thus true or false as a whole; the latter 
is itself a Boolean-valued expression over the state, possibly true in 
some states and false in others.28 Similarly we regard exp = exp! as 


27 We will not distinguish predicates from Boolean-valued expressions. 
23Note that erp > exp! is different again, in fact badly typed if erp and exp! are 
expectations: one real-valued function cannot “imply” another. 
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true in just those states where erp and exp’ are equal, and false in 


the rest. 
The closest standard equivalent of > is the entailment relation = 
between predicates?? — and in fact post = post’ exactly when 


[post] > [post’], meaning that the “embedding” of E is >. 


Multi-way probabilistic choices — A probabilistic choice over N 
alternatives can be written horizontally 


(prog, @ pı |---| progy @ py) 
or vertically 
prog, pi 
prog, @ p2 
progy @pn 


in which the probabilities are enumerated and sum to no more 
than one. We can also write a “probabilistic comprehension” 
(| i: I + prog; @ pi) over some countable index set J. In general, we 
have 


wp.(prog, @ pı |---| progy @ pn).postE 
— pı * wp.prog, .postE + +++ + pyn * wp.progy poste . 


It means “execute prog, with probability at least pi, and prog, with 
probability at least pa...” 1 

If the probabilities sum to 1 exactly, then it is a simple N-way prob- 
abilistic branch; if there is a deficit 1—X;p;, it gives the probability 
of aborting. 

When all the programs prog, are assignments with the same left-hand 
side, say z: = expr,, we write even more briefly 


z:= (expr, @ pı |---| expry @pn) . 


Variations on „p — By prog ®p prog’ we mean prog’ p® prog, and in 
general we write prog pp prog’ for 


prog @ p 
prog’ @ p 
prog prog’ @ 1— (ptp), 


the program that executes prog with probability at least p and prog’ 


29One predicate ENTAILS another, written }, just when it implies the other in all 
states. 


30See Sec. 4.3 for an example of the vertical notation. 
311ţ is “at least p;” because if the probabilities sum to less than one there will be an 
“aborting” component, which might behave like prog;. 
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with probability at least p/; we assume p+ p' < 1. 
By >p® we mean p®o, and so on. (See also (B.3) on p. 328.) 


Demonic choice — We write demonic choice between assignments to 
the same variable x as 


z:€ {ezpr, Eapro}, (1.17) 
or “= expr, E expr (ese > l 


in each case abbreviating x: = expr, Na: = expr: :. More generally 
we can write x: € expr or x: Z expr if expr is set-valued, provided the 
implied choice is finite.?? 


Iteration — The construct (mu zzz. C) behaves as prescribed by the 
program context C except that it invokes itself recursively whenever 
it reaches a point where the program name zzz appears in C. Then, 
in the usual way, iteration is a special case of recursion: 


do pred — body od 


is (mu zzz» (body; zer) if pred else skip). °° (1.18) 


1.5.3 Example of syntax: the “Monty Hall” game 


We illustrate the syntax of our language with the example program of 
Fig. 1.5.1. There are three curtains, labelled A, B and C, and a prize is 
hidden nondeterministically behind one of them, say pc. A contestant hopes 
to win the prize by guessing where it is hidden: he chooses randomly to 


32None of our examples requires a choice from the empty set. We see later that the 
finiteness requirement is so that our programs will be continuous (Footnote 60 on p. 71); 
and in some cases — for example, the third and fourth statements of the program shown 
in Fig. 1.5.1 — we rely on type information for that finiteness. 

33 An equivalent but simpler formulation is given by the least fixed-point definition 


wp.(do pred — body od).R := (uQ * wp.body.@ if pred else R), (1.19) 


which matches Dijkstra’s original formulation more closely [Dij76]. But there is some 
technical work required to get between the two, as we explain later at (7.12). The 
expression on the right can be read 


the least pre-expectation Q such that 
Q = wp.body.Q if pred else R , 


and is called a FIXED POINT because placing Q in the expression does not alter its value 
— this is the mathematical equivalent of “and the same again” when the loop returns 
to its starting point for potentially more iterations. 

The “least,” for us, means the lowest expectation — that reflects the view, appropriate 
for elementary sequential programming, that unending iteration should have little worth 
(in fact, zero). For standard programming, the order is false < true so that taking the 
least fixed-point means adopting the view that an infinite loop does not establish any 
postcondition (i.e., has precondition false). 

A more discriminating treatment of unending computations is given in Part III. 
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pe € {A,B,C}; Prize hidden behind curtain. 
c:= (A@z|B@QE|CQ@ 4); Contestant chooses randomly. 
ac:¢ {pc, cc}; Another curtain opened; it’s empty. 
(cc: g (cc, ac}) if clever else skip Changes his mind — or not? 


The three “curtain” variables ac, cc, pc are of type TA, B,C}. 
Written in full, the first three statements would be 
pe: = Alpe: = Bl pe:= C; 
cc: = A14 (ce: = Big@ce= C); 
ac:€ {A,B,C} — {pc, cc}. 
The fourth statement is written using ¢ just for convenience — in fact it executes 
deterministically, since cc and ac are guaranteed to be different at that point. 


Figure 1.5.1. THE “MONTY HALL” PROGRAM 


point to curtain cc. The host then tries to get the contestant to change his 
choice, showing that the prize is not behind some other curtain ac — which 
means that either the contestant has chosen it already or it is behind the 
other closed curtain. 

Should the contestant change his mind? 


1.5.4 Intuitive interpretation of pGCL expectations 


In its full generality, an expectation is a function describing how much each 
program state is “worth.” 

The special case of an embedded predicate [pred] assigns to each state 
a worth of zero or of one: states satisfying pred are worth one, and states 
not satisfying pred are worth zero. The more general expectations arise 
when one estimates, in the initial state of a probabilistic program, what 
the worth of its final state will be. That estimate, the “expected worth” of 
the final state, is obtained by summing over all final states 


the worth of the final state multiplied by the probability the 
program “will go there” from the initial state. 


Naturally the “will go there” probabilities depend on “from where,” and 
so that expected worth is a function of the initial state. 

When the worth of final states is given by [post], the expected worth of 
the initial state turns out to be just the probability that the program will 
reach post. That is because 
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expected worth of initial state 


(probability prog reaches post) 
* (worth of states satisfying post) 


+ (probability prog does not reach post) 
* (worth of states not satisfying post) 


(probability prog reaches post) + 1 
+ (probability prog does not reach post) x 0 


= probability prog reaches post ; 


note we have relied on the fact that all states satisfying post have worth 
one. 
More general analyses of programs prog in practice lead to conclusions 
of the form 
p = wp.prog.[post| 
for some p and post which, given the above, we can interpret in two 


equivalent ways: 


e the expected worth [post] of the final state is at least the value of p 
in the initial state; or 


e the probability that prog will establish post is at least p.34 


Each interpretation is useful, and in the following example we can see 
them acting together: we ask for the probability that two fair coins when 
flipped will show the same face, and calculate 


x:= HiOr=T ; _ 
wp. y= H19y:= T qag] 


1D, := and sequential composition 


wp.(0:= H oa: T).([e = H] /2+ [x = T] /2) 


(1/2)([H = H] /2 + [H = T] /2) 
+ (1/2)([T = H] /2 + [T = T] /2) 


34We must say “at least” in general, because possible demonic choice in prog means 
that the pre-expectation is only a lower bound for the actual expected value the program 
could deliver; and some analyses give only the weaker p 3 wp.prog.[post] in any case. 
See also Footnote 14 on p. 89. 

35 See Fig. 1.5.3 for this definition. 
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(1/2)(1/2 + 0/2) + (1/2)(0/2 + 1/2) definition [-] 
1/2. arithmetic 


Hh Tl 


We can then use the second interpretation above to conclude that the faces 
are the same with probability 1/2.36 
But part of the above calculation involves the more general expression 


wp.(x:= Hi@a:= T).([e = H]/2+ [x = T] /2) , (1.20) 


and what does that mean on its own? It must be given the first inter- 
pretation, that is as an expected worth, since “will establish |x = H] /2 + 
[x = T] /2” makes no sense. Thus it means 


the expected value of the expression [x = H]/2 + [xz = T] /2 
after executing the program x: = H 10 ms T, 


which the calculation goes on to show is in fact 1/2. But for our overall 
conclusions we do not need to think about the intermediate expressions — 
they are only the “glue” that holds the overall reasoning together.” 


1.5.5 Semantics 


The probabilistic semantics is derived from generalising the standard se- 
mantics in the way suggested in Sec. 1.3. Let the state space be S. 


Definition 1.5.2 EXPECTATION SPACE The space of expectations over 
S is defined 


S := (S—R>,>), 


where the entailment relation >, as we have seen, is inherited pointwise 
from the normal < ordering in R>. The expectation-transformer model for 
programs is 


TS := (ES—ESD), 


where we write the functional arrow backward just to emphasise that such 
transformers map final post-expectations to initial pre-expectations, and 
where the refinement order E is derived pointwise from entailment > on 
iS. 


36 (Recall Footnote 34.) If we do know, by other means say, that the program is 
deterministic (though still probabilistic), then we can say the pre-expectation is exact. 
37See p. 271 for an example of this same analogy, but in the context of temporal logic. 
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Although both ES and TS are lattices, neither is a complete partial 
order,?® because R> itself is not. (It lacks an adjoined oo element.) In 
addition, when S' is infinite (see e.g. Sec. 8.2 of Part II) we must impose 
the condition on elements of ES that each of them be bounded above by 
some non-negative real.39 


In Fig. 1.5.3 we give a probabilistic semantics to the constructs of our 
language. It has the important feature that the standard programming 
constructs behave as usual, and are described just as concisely. 

Note that our semantics states how wp.prog in each case transforms an 
expression in the program variables: that is, we give a procedure for calcu- 
lating the greatest pre-expectation by purely syntactic manipulation. An 
alternative view is to see the post-expectations as mathematical functions 
of type ES, and the expressions wp.prog are then of type TS. 

The expression-based view is more convenient in an introduction, and 
for the treatment of specific programs; the function-based view is more 
convenient (and, for recursion, necessary) for general properties of expec- 
tation transformers. In this chapter and the rest of Part I we retain the 


38 A PARTIAL ORDER differs from the familiar “total” orders like “<” in that two el- 
ements can be “incomparable”; the most common example is subset C between sets, 
which satisfies REFLEXIVITY (a set is a subset of itself), ANTI-SYMMETRY (two sets cannot 
be subsets of each other without being the same set) and TRANSITIVITY (one set within 
a second within a third is a subset of the third directly as well). But it is not true that 
for any two sets one is necessarily a subset of the other. 

A LATTICE is a non-empty partially ordered set where for all x, y in the set there is a 
GREATEST LOWER BOUND xM y and and a LEAST UPPER BOUND «Ll y. This holds e.g. for 
the lattice of sets, as above; but the collection of non-empty sets is not a lattice, because 
ZN y (which is how «My is written for sets) is not necessarily non-empty even if x and 
y are. 

A partial order C is CHAIN- or DIRECTED COMPLETE — then called a CPO — when it 
contains all limits of chains or directed sets respectively, where a CHAIN is a set totally 
ordered by C and a set is C-DIRECTED if for any x,y in the set there is a z also in the 
set such that x,y E z. (Since a chain is directed, directed completeness implies chain 
completeness; in fact with the Axiom of Choice, chain- and directed completeness are 
equivalent.) 

All of these details can be found in standard texts [DP90]. 

39There is a difference between requiring that there be an upper bound for all expec- 
tations (we do not) and requiring that each expectation separately have an upper bound 
(we do). 

In the first case, we would be saying that there is some M such that every expectation 
a in ES satisfied a > M. That would be convenient because it would make both ES 
and TS complete partial orders, trivially; and that would e.g. allow us to use a standard 
treatment of fixed points. 

But we adopt the second case where, for each expectation a separately, there is some 
Ma such that a > Ma; and, as a varies, these Ma's can increase without bound. That 
is why ES is not complete and is, therefore, why we will need a slightly special argument 
when dealing with fixed points. 
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wp.abort.postE := 0 
wp.skip.postE :— postk 
wp.(a:= expr).postE := postE (x +> expr) 
wp.(prog; prog').postE :— wp.prog.(wp.prog’.postE ) 
wp.(progl prog').postE := wp.prog.postE min wp.prog’.postE 
wp.(prog p® prog').postE — px wp.prog.postE + px wp.prog’.postE 


Recall that p is the complement of p. 


The expression on the right gives the greatest pre-expectation of post with re- 
spect to each pGCL construct, where post is an expression of type ES over the 
variables in state space S. (For historical reasons we continue to write wp instead 


of gp.) 


In the case of recursion, however, we cannot give a purely syntactic definition. 
Instead we say that 


(mu zare C) := least fixed-point of the function centz: TS — TS 
defined so that cntz.(wp.zzz) = wp.C. *° 


Figure 1.5.3. PROBABILISTIC wp-SEMANTICS OF pGCL 


expression-based view as far as possible; but in Part II we use the more 
mathematical notation. (See for example Sec. 5.3.) 


The worst program abort cannot be guaranteed to terminate in any 
proper state and therefore maps every post-expectation to 0. The imme- 
diately terminating program skip does not change anything, therefore the 
expected value of post-expectation postE after execution of skip is just its 
actual value before. The pre-expectation of the assignment z: = expr is 
the postcondition with the expression expr substituted for x. Sequential 
composition is functional composition. The semantics of demonic choice M 
reflects the dual metaphors for it: as abstraction, we must take the mini- 
mum because we are giving a guarantee over all possible implementations; 
as a demon’s behaviour, we assume he acts to make our expected winnings 
as small as possible. 

The pre-expectation of probabilistic choice is the weighted average of the 
pre-expectations of its branches. Since any such average is no less than the 
minimum it follows immediately that probabilistic choice refines demonic 


40Because TS is not complete, to ensure existence of the fixed point we insist that 
the transformer-to-transformer function cntz be “feasibility-preserving,” i.e. that if ap- 
plied to a feasible transformer it returns a feasible transformer again. “Feasibility” of 
transformers is one of the “healthiness conditions” we will encounter in Sec. 1.6. For 
convenience, we usually assume that cntz is continuous as well. 

See Lem. 5.6.8 on p. 148. 
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choice, which corresponds to our intuition. In fact we consider probabilistic 
choice to be a deterministic programming construct; that is we say that a 
program is deterministic if it is free of demonic nondeterminism unless it 
aborts.*4 

Finally, recursive programs have least-fixed-point semantics as usual. 


1.5.6 Example of semantics: Monty Hall again 


We illustrate the semantics by returning to the program of Fig. 1.5.1. Con- 
sider the post-expectation [pc = cc], which takes value one just in those 
final states in which the candidate has correctly chosen the prize. Work- 
ing backwards through the program’s four statements, we have first (by 
standard wp calculations) that 


wp. ((cc: Z {cc, ac!) if clever else skip) .[pc = cc] 
= [clever] * [{ ac, cc, pc} = {A, B,C}] + clever] [pe = cc] , 


because (in case clever) the nondeterministic choice is guaranteed to pick 
pc only when it cannot avoid doing so.*? 
Standard reasoning suffices for our next step also: 


wp. (ac: € {pc, cc}). 
([elever] x Hac, cc, pc} = {A, B, C} + [clever] * [pc = ce]) 


= [clever] x [pe # cc] + [clever] * [pe = ce]. 


For the clever case note that {ac, cc, pc} = {A,B,C} holds (in the post- 
expectation) iff all three elements differ, and that the statement itself 
establishes only two of the required three inequalities — that ac # pc 
and ac # cc. The weakest precondition supplies the third. 

For the —clever case note that neither pc nor cc is assigned to by 
ac:¢ {pc, cc}, so that pe = cc holds afterwards iff it held before. 

The next statement is probabilistic, and so produces a probabilistic pre- 
expectation involving the factors 1/3 given explicitly in the program; we 
have 


wp. (cc:= (A@F|B@z|C@S)). 
([clever] x [pc # cc] + [clever] * [pe = ce]) 


= [clever] /3 * ([pe # Al + [pe # B] + [pe # C]) 
+ [Þclever]/3 * ([pc= Al + [pc = B] + [pe = C]) 


4l Some writers call that PRE-DETERMINISM: “deterministic if terminating.” 

“In Fig. 1.5.1 we said that this fourth statement “executes deterministically”; yet 
here we have called it nondeterministic. 

On its own, it is nondeterministic; but in the context of the program its nondetermin- 
ism is limited to making a choice from a singleton set, as our subsequent calculations 
will show. 
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([clever] /3)* 2 + ([nclever] /3) +1 type of pc is (A, B,C} * 
2 [clever] /3 + [clever] /3 . 


Then for the first statement pc: € {A,B,C} we only note that pc does 
not appear in the final condition above, thus leaving it unchanged under 
wp: with simplification it becomes 


(1 + [clever])/3 , 


which is thus the pre-expectation for the whole program. 

Since the post-expectation [pc = cc] is standard (it is the characteristic 
function of the set of states in which pc = cc), we are able to interpret the 
pre-expectation directly as the probability that pc = cc will be satisfied on 
termination: we conclude that the contestant has 2/3 probability of finding 
the prize if he is clever, and only 1/3 if he is not. 


1.6 Healthiness and algebra for pGCL 


Recall that all standard GOL constructs satisfy the important property of 
conjunctivity** — that is, for any GCL command prog and post-conditions 
post, post’ we have 


wp.prog.(post A post”) =  wp.prog.post A wp.prog.post’ . 


That “healthiness condition” [Dij76] is used to prove many general 
properties of programs. 

In pGCL the healthiness condition becomes “sublinearity,” a generalisa- 
tion of conjunctivity: 45 


Definition 1.6.1 SUBLINEARITY OF pGCL Let co, c1, co be non-negative 
reals, and postE,, postE, expectations; then all pGCL constructs prog 
satisfy 


wp.prog.(c1 * postE, + co * postE, © co) 
= Cı * wp.prog.postE; + co * wp.prog.postE> © co, 


which property of prog is called sublinearity. Truncated subtraction © is 
defined 


rOy := (—y)max0, 


43 Footnote 50 on p. 33 explains how typing might be propagated this way. 

They satisfy monotonicity too, which is implied by conjunctivity. 

45 Having discovered a probabilistic analogue of conjunctivity, we naturally ask for an 
analogue of disjunctivity. That turns out to be “super-linearity” — which when combined 
with sublinearity gives (just) linearity, and is characteristic of deterministic probabilistic 
programs, just as disjunctivity (with conjunctivity) characterises deterministic standard 
programs. See Sec. 8.3. 
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the maximum of the normal difference and zero. It has syntactic precedence 
lower than +. 


Although it has a strange appearance, from sublinearity we can extract 
a number of very useful consequences, as we now show. We begin with 
monotonicity, feasibility and scaling.*® 


Definition 1.6.2 HEALTHINESS CONDITIONS 


e monotonicity: increasing a post-expectation can only increase the 
pre-expectation. Suppose postE = postE' for two expectations 
postE, postE”; then 


wp.prog.postE' 
wp.prog.(postE + (postE’ — postE)) 


S= postE'—postE € 0, hence well defined; 
sublinearity with co,c1,c2 := 0,1,1 
wp.prog.postE + wp.prog.(postE'—postE ) 


e wp.prog.postE . 0 > wp.prog.(postE'—postE) 


e feasibility: pre-expectations cannot be “too large.” First note that 


wp.prog.0 
wp.prog.(2 * 0) 
2 x wp.prog.0 , sublinearity with co,c1,c2 : = 0,2,0 


"ar 


so that wp.prog.0 must be zero. 
Now write max postk for the maximum of postE over all its variables’ 
values; then 


0 
= wp.prog.0 feasibility above 
= wp.prog.(postE © max postE) postE O max postE = 0 
e 


wp.prog.postE © max post . Co, C1, Ca : = max postE,1,0 
But from 0 € wp.prog.postE © max postE we have trivially that 

wp.prog.postE = max postk, (1.21) 
which we identify as the feasibility condition for pGCL.7 


e scaling: multiplication by a non-negative constant distributes through 
commands. Note first that wp.prog.(c * postE) € cx wp.prog.postE 
directly from sublinearity. 


46These properties are collected together in Sec. 5.6, and restated in Part II as 
Defs. 5.6.3—5.6.5. 

47 Note how the general (1.21) implies the STRICTNESS condition wp.prog.0 = 0, a direct 
numeric embedding of Dijkstra’s Law of the Excluded Miracle. 
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For > we have two cases: when c is zero, trivially from feasibility 
wp.prog.(0+ postE) =  wp.prog.0 = 0 = O*wp.prog.postE; 
and for the other case c # 0 we reason 

wp.prog.(c * postE ) 

c(1/c) * wp.prog.(c * postE) c#0 


cx wp.prog.((1/c)c * postE)) sublinearity using 1/c 
cx wp.prog.postE , 


M l 


thus establishing wp.prog.(c * postE) = c * wp.prog.postE generally. 
(See p. 53 for an example of scaling’s use.) 


The remaining property we examine is so-called “probabilistic conjunc- 
tivity.” Since standard conjunction “A” is not defined over numbers, we 
have many choices for a probabilistic analogue “&” of it, requiring only 
that 


0&0 = 
0&1 
1&0 
1&1 = 


(1.22) 


I 


II 
mooo 


for consistency with embedded Booleans. 

Obvious possibilities for & are multiplication x and minimum min, and 
each of those has its uses; but neither satisfies anything like a generalisation 
of conjunctivity. Return for example to the program of Fig. 1.5.1, and 
consider its second statement 


c:= (A@i|BQi|C@). 


Writing prog for the above, with postcondition [cc 4 C] min [cc # A] we 
nd 


EP 


wp.prog.( [cc # C] min [cc # Al) 
wp.prog. lee # CA ce # A] 

wp.prog.[cc = B] 

1/3 

2/3 min 2/3 

wp.prog.|ec # C] min wp.prog.[ee # A] . 


LL 


Thus probabilistic programs do not distribute min in general, and we must 
find something else. Instead we define 


exp& em := erp+erp Ol, (1.23) 
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whose right-hand side is inspired by sublinearity when co, c1, ca := 1, 1,1. 
The operator is commutative; and if we restrict expectations to [0,1] it is 
associative as well. Note however that it is not idempotent.*® 

We now state a (sub-)distribution property for &, a direct consequence 
of sublinearity. 


sub-conjunctivity: the operator & sub-distributes through expectation 
transformers. From sublinearity with co, cu, ca := 1, 1,1 we have 


wp.prog.(postE & post!) < wp.prog.postE & wp.prog.postE' 
for all prog. 


(Unfortunately there does not seem to be a full (=) conjunctivity property 
for expectation transformers.) 


Beyond sub-conjunctivity, we say that & generalises conjunction for sev- 
eral other reasons as well. The first is of course that it satisfies the standard 
properties (1.22). 

The second reason is that sub-conjunctivity (a consequence of sub- 
linearity) implies “full” conjunctivity for standard programs. Standard 
programs, containing no probabilistic choices, take standard [post ]-style 
post-expectations to standard pre-expectations: they are the embedding of 
GCL in pGCL, and for standard prog we now show that 


wp.prog.([post] & [post']) (1.24) 
= _wp.prog.[post] & wp.prog.[post’] . i 
First note that “€” comes directly from sub-conjunctivity above, taking 
postE, postE’ to be [post], [post']. 
For “=” we appeal to monotonicity, because [post] & [post] > [post] 
whence wp.prog.([post] & [post']) > wp.prog.|post], and similarly for post’. 
Putting those together gives 


wp.prog.([post] & [post'])) > wp.prog.[post] min wp.prog.[post'] , 


by elementary arithmetic properties of >. But on standard expectations 
— which wp.prog.[post] and wp.prog.[post'] are, because prog is standard 
— the operators min and & agree. 


A last attribute linking & to A comes straight from elementary prob- 
ability theory. Let X and Y be two events, not necessarily independent: 
then 


if the probability of X is at least p, and the probability of Y is 
at least g, the most that can be said in general about the joint 
event X NY is that it has probability at least p & q. 


48 A binary operator © is IDEMPOTENT just when 2 © g = z for all x. 
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To see this, we begin by recalling that for any events X,Y and any 
probability distribution Pr we have? 


Pr (XNY) 
= Pr.X + Pr .Y — Pr .(XUY) 


= because Pr.(X UY) < 1 and Pr.(XNY)>0 
(Pr.X+Pr.Y—1)uU0. 


We are not dealing with exact probabilities however: when demonic non- 
determinism is present we have only lower bounds. Thus we address the 
question 


Given only Pr.X > p and Pr.Y > q, what is the most precise 
lower bound for Pr.(X NY) in terms of p and g? 


From the reasoning above we obtain 
(p+q—1)U0 (1.25) 


immediately as a lower bound. But to see that it is the greatest lower bound 
we must show that for any X,Y,p,q there is a probability distribution 
Pr such that the bound is attained; and that is illustrated in Fig. 1.6.3, 
where an explicit distribution is given in which Pr.X = p, Pr.Y = q and 
Pr .(X NY) is as low as possible, reaching (p + q — 1) U 0 exactly. 
Returning to our example, but using &, we now have equality: 

wp.prog.(lcc # C] & [cc # AJ) 

wp.prog.[cc = BI 

1/3 

2/3 & 2/3 

wp.prog.|ec # CO] & wp.prog.lee # Al. 


The & operator also plays a crucial role in the proof (Chap. 7) of our 
probabilistic loop rule, presented in Chap. 2 and used in the examples to 
come. 


1.7 Healthiness example: modular reasoning 


As an example of the use of healthiness conditions, we formulate and prove 
a simple but very powerful property of pGCL programs, important for 
“modular” reasoning about them. 

By modular reasoning in this case we mean determining, first, that a 
program prog of interest has some standard property; then for subsequent 
(possibly probabilistic) reasoning we assume that property. This makes 


49The first step is the modularity law for probabilities. 
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Pr.X = pN(1-q)+(p+q-1)U0 = p 
Pr.Y = qN(1-p)+(p+q-1)U0 = q 
Pr.(XNY) = (p+q-—1)U0 = p&g 


The lower bound p & q is the best possible. 


Figure 1.6.3. PROBABILISTIC CONJUNCTION & DEPICTED 


the reasoning modular in the sense that we do not have to prove all the 
properties at once.50 
We formulate the principle as a lemma. 


Lemma 1.7.1 MODULAR REASONING Suppose for some program prog 
and predicates pre and post we have 


[pre] =>  wp.prog.[post] , (1.26) 


which is just the embedded form of a standard Hoare-triple specification. 
Then in any state satisfying pre we have for any bounded post-expectations 
postE, postE” that 


wp.prog.postE = wp.prog.postE’, 5! 


provided post implies that postE and postE' are equal. 
That is, with (1.26) we can assume the truth of post when reasoning 
about the post-expectation, provided pre holds in the initial state. 


50A typical use of this appeals to standard reasoning, in a “first pass,” to establish 
that some (Boolean) property — such as a variable’s typing — is invariant in a program; 
then, in the “second pass” during which probabilistic reasoning might be carried out, we 
can assume that invariant everywhere without comment. Recall Footnote 43 on p. 28; 
see also the treatment of Fig. 7.7.11 on p. 211 to come. 

SlWe write “=” rather than “=” because the equality holds only in some states 
(those satisfying pre), as indicated in the text above. Thus writing “=, 3, €” as we 
do elsewhere is just an alternative for the text “in all states”. 
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Proof: We use the healthiness conditions of the previous section, and 
we assume that the post-expectations postE, postE' are bounded above by 
some nonzero M. Given that the current state satisfies pre, we then have 


wp.prog.([post] + postE ) 


= M x wp.prog.([post] x postE/M) scaling 
= M + wp.prog.([post] & (postE/M)) [post] is standard; 
postE/M > 1 
> M + (wp.prog.[post] & wp.prog.(postE/M)) sub-conjunctivity 
> M + ([prel & wp.prog.(postE/M)) Assumption (1.26) 
= M x (1 & wp.prog.(postE/M)) pre holds in current state 
= M x wp.prog.(postE/M) arithmetic 
E wp.prog.postE . scaling 


The opposite inequality is immediate (in all states) from the monotonicity 
healthiness property, since [post] x postE > postE. Thus, still assuming pre 
in the current state, we conclude with 


wp.prog.postE 
= wp.prog.([post] + postE ) above 
— wp.prog.([post] + postE’) assumption about postE, postE' 
= wp.prog.postk” . as above, but for postE' 


This kind of reasoning is nothing new for standard programs, and indeed 
is usually taken for granted (although its formal justification appeals to 
conjunctivity). It is important that it is available in pGCL as well.°? 


1.8 Interaction between probabilistic- 
and demonic choice 


We conclude with some illustrations of the interaction of demonic and prob- 
abilistic choice. Consider two variables x,y, one chosen demonically and 
the other probabilistically. Suppose first that z is chosen demonically and 
y probabilistically, and take post-expectation [x = y]. Then 


52Lem. 1.7.1 holds even when postE, postE” are unbounded, provided of course that 
wp.prog is defined for them; the proof of that can be given by direct reference to the 
definition of wp over the model, as set out in Chap. 5. 

We will need that extension for our occasional excursions beyond the “safe” bounded 
world we have formally dealt with in the logic (e.g. Sections 2.11 and 3.3). 
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wp.((a:= 1Na:= 2); ys 110 y:= 2)).[a = y] 
wp.(a:= 1M a:= 2).([a = 1] /2 + [a = 2] /2) 

([1 = 1] /2 + [1 = 2] /2) min ([2 = 1] /2 + [2 = 2] /2) 
(1/2+0/2) min (0/2 + 1/2) 

1/2, 


from which we see that program establishes z = y with probability at least 
1/2: no matter which value is assigned to x, with probability 1/2 the second 
command will assign the same to y. 

Now suppose instead that it is the second choice that is demonic. Then 
we have 


wes lies 2); Y= 1Ny:= sy 
wp (es 119 z:= (es 1] min [x = 2]) 

([1 = 1] min [1 = 2])/2 + ([2 = 1] min [2 = 2])/2 
(1 min 0)/2 + (0 min 1)/2 

0, 


reflecting that no matter what value is assigned probabilistically to z, the 
demon could choose subseguently to assign a different value to y. 

Thus it is clear that the execution order of occurrence of the two choices 
plays a critical role in their interaction, and in particular that the demon 
in the first case cannot make the assignment “clairvoyantly” to z in order 
to avoid the value that later will be assigned to y. 


1.9 Summary 


Being able to reason formally about probabilistic programs does not of 
course remove per se the complexity of the mathematics on which they rely: 
we do not now expect to find astonishingly simple correctness proofs for 
all the large collection of randomised algorithms that have been developed 
over the decades [MR95]. However it should be possible in principle to 
locate and determine reliably what are the probabilistic/mathematical facts 
the construction of a randomised algorithm needs to exploit... which is 
of course just what standard predicate transformers do for conventional 
algorithms. 

In the remainder of Part I we concentrate on proof rules that can be 
derived for pGCL — principally for loops — and on examples. 


The theory of expectation transformers with nondeterminism is given 
in Part II, where in particular the role of sublinearity is identified and 
proved: it characterises a subspace of the predicate transformers that has 
an equivalent operational semantics of relations between initial and final 
probabilistic distributions over the state space — a formalisation of the 
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gambling game of Sec. 1.3. All the programming constructs of the prob- 
abilistic language of guarded commands belong to that subspace, which 
means that the programmer who uses the language can elect to reason 
about it either axiomatically or operationally. 


Chapter notes 


In the mid-1970’s, Rabin demonstrated how randomisation could be used to solve 
a variety of programming problems [Rab76]; since then, the range of applications 
has increased considerably [MR95], and indeed we analyse several of them as case 
studies in later chapters. In the meantime — fuelled by randomisation’s impres- 
sive applicability — the search for an effective logic of probabilistic programs 
became an important research topic around the beginning of the 1980’s, and 
remained so until the mid-1990’s. Ironically, the major technical difficulty was 
due, in the main, to one of standard programming’s major successes: demonic 
nondeterminism, the basis for abstraction. It was a challenging problem to de- 
cide what to do about it, and how it should interact with the new probabilistic 
nondeterminism. 


The first probabilistic logics did not treat demonic nondeterminism at all — 
Feldman and Harel [FH84] for instance proved soundness and completeness for 
a probabilistic PDL which was (in our terms) purely deterministic. The logical 
language allowed statements about programs to be made at the level of probabil- 
ity distributions and, as we discuss in Sec. A.2, that proves to be an impediment 
to the natural introduction of a demon. A Hoare-style logic based on similar 
principles has also been explored by den Hartog and de Vink [dHdV02]. 


The crucial step of a quantitative logic of expectations was taken by Kozen 
[Koz85]. Subsequently Jones [Jon90], with Plotkin and using the evaluations from 
earlier work of Saheb-Djahromi [SD80] that were based directly on topologies 
rather than on o- or Borel algebras, worked on more general probabilistic pow- 
erdomains; as an example of her technique she specialised it to the Kozen-style 
logic for deterministic programs, resulting in the sub-probability measures that 
provide a neat way to quantify nontermination.”3 


In 1997 He et al. [HSM97] finally proposed the operational model containing 
all the ingredients for a full treatment of abstraction and program refine- 
ment in the context of probability — and that model paved the way for the 
“demonic/probabilistic” program logic based on expectation transformers. Sub- 
sequently Ying [Yin03] has worked towards a probabilistic refinement calculus in 
the style of Back [BvW98]. 


53'The notion of sub-probability measures to characterise termination was present much 
earlier, for example in the work of Feldman and Harel [FH84]. 


